Dataset statistics
| Number of variables | 19 |
|---|---|
| Number of observations | 16228 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 2.0 MiB |
| Average record size in memory | 132.0 B |
Variable types
| Numeric | 12 |
|---|---|
| Categorical | 7 |
grade is highly correlated with bathrooms and 7 other fields | High correlation |
bathrooms is highly correlated with grade and 3 other fields | High correlation |
bedrooms is highly correlated with grade and 2 other fields | High correlation |
sqft_above is highly correlated with grade and 7 other fields | High correlation |
sqft_living15 is highly correlated with grade and 3 other fields | High correlation |
floors is highly correlated with grade and 4 other fields | High correlation |
sqft_lot is highly correlated with zipcode and 3 other fields | High correlation |
price is highly correlated with grade and 3 other fields | High correlation |
sqft_lot15 is highly correlated with zipcode and 3 other fields | High correlation |
sqft_living is highly correlated with grade and 6 other fields | High correlation |
antiguedad_venta is highly correlated with zipcode and 7 other fields | High correlation |
waterfront is highly correlated with view | High correlation |
view is highly correlated with waterfront | High correlation |
zipcode is highly correlated with sqft_lot and 2 other fields | High correlation |
sqft_basement is highly correlated with sqft_living | High correlation |
condition is highly correlated with antiguedad_venta | High correlation |
df_index has unique values | Unique |
sqft_basement has 10173 (62.7%) zeros | Zeros |
yr_renovated has 15641 (96.4%) zeros | Zeros |
antiguedad_venta has 302 (1.9%) zeros | Zeros |
Reproduction
| Analysis started | 2022-10-02 04:29:24.740179 |
|---|---|
| Analysis finished | 2022-10-02 04:30:03.329709 |
| Duration | 38.59 seconds |
| Software version | pandas-profiling v3.3.0 |
| Download configuration | config.json |
| Distinct | 16228 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 18813.10118 |
| Minimum | 1 |
|---|---|
| Maximum | 113866 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 126.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1133.35 |
| Q1 | 6227.75 |
| median | 14532.5 |
| Q3 | 27261.5 |
| 95-th percentile | 51233.85 |
| Maximum | 113866 |
| Range | 113865 |
| Interquartile range (IQR) | 21033.75 |
Descriptive statistics
| Standard deviation | 16137.5866 |
|---|---|
| Coefficient of variation (CV) | 0.8577845002 |
| Kurtosis | 1.640962296 |
| Mean | 18813.10118 |
| Median Absolute Deviation (MAD) | 9668 |
| Skewness | 1.266324341 |
| Sum | 305299006 |
| Variance | 260421701.1 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 19857 | 1 | < 0.1% |
| 4771 | 1 | < 0.1% |
| 3170 | 1 | < 0.1% |
| 27214 | 1 | < 0.1% |
| 45184 | 1 | < 0.1% |
| 13373 | 1 | < 0.1% |
| 21447 | 1 | < 0.1% |
| 11241 | 1 | < 0.1% |
| 68532 | 1 | < 0.1% |
| 27533 | 1 | < 0.1% |
| Other values (16218) | 16218 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 5 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 14 | 1 | |
| 15 | 1 |
| Value | Count | Frequency (%) |
| 113866 | 1 | |
| 111906 | 1 | |
| 109571 | 1 | |
| 108311 | 1 | |
| 99297 | 1 | |
| 98195 | 1 | |
| 98015 | 1 | |
| 94768 | 1 | |
| 94083 | 1 | |
| 93739 | 1 |
| Distinct | 70 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 98078.55928 |
| Minimum | 98001 |
|---|---|
| Maximum | 98199 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 63.5 KiB |
Quantile statistics
| Minimum | 98001 |
|---|---|
| 5-th percentile | 98005 |
| Q1 | 98033 |
| median | 98065 |
| Q3 | 98118 |
| 95-th percentile | 98177 |
| Maximum | 98199 |
| Range | 198 |
| Interquartile range (IQR) | 85 |
Descriptive statistics
| Standard deviation | 53.24375101 |
|---|---|
| Coefficient of variation (CV) | 0.0005428684047 |
| Kurtosis | -0.8597956826 |
| Mean | 98078.55928 |
| Median Absolute Deviation (MAD) | 42 |
| Skewness | 0.3980821458 |
| Sum | 1591618860 |
| Variance | 2834.897022 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 98038 | 469 | 2.9% |
| 98103 | 468 | 2.9% |
| 98115 | 462 | 2.8% |
| 98052 | 455 | 2.8% |
| 98042 | 445 | 2.7% |
| 98117 | 433 | 2.7% |
| 98034 | 429 | 2.6% |
| 98023 | 399 | 2.5% |
| 98133 | 397 | 2.4% |
| 98118 | 393 | 2.4% |
| Other values (60) | 11878 |
| Value | Count | Frequency (%) |
| 98001 | 294 | |
| 98002 | 155 | |
| 98003 | 212 | |
| 98004 | 126 | 0.8% |
| 98005 | 122 | 0.8% |
| 98006 | 322 | |
| 98007 | 107 | 0.7% |
| 98008 | 213 | |
| 98010 | 90 | 0.6% |
| 98011 | 150 |
| Value | Count | Frequency (%) |
| 98199 | 212 | |
| 98198 | 221 | |
| 98188 | 105 | 0.6% |
| 98178 | 209 | |
| 98177 | 179 | |
| 98168 | 218 | |
| 98166 | 193 | |
| 98155 | 364 | |
| 98148 | 51 | 0.3% |
| 98146 | 219 |
| Distinct | 10 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.516144935 |
| Minimum | 3 |
|---|---|
| Maximum | 12 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 63.5 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 6 |
| Q1 | 7 |
| median | 7 |
| Q3 | 8 |
| 95-th percentile | 9 |
| Maximum | 12 |
| Range | 9 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.02266091 |
|---|---|
| Coefficient of variation (CV) | 0.1360618933 |
| Kurtosis | 0.7856244589 |
| Mean | 7.516144935 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.4890425822 |
| Sum | 121972 |
| Variance | 1.045835338 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 7 | 7176 | |
| 8 | 4737 | |
| 9 | 1822 | 11.2% |
| 6 | 1622 | 10.0% |
| 10 | 557 | 3.4% |
| 5 | 188 | 1.2% |
| 11 | 97 | 0.6% |
| 4 | 23 | 0.1% |
| 3 | 3 | < 0.1% |
| 12 | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 3 | 3 | < 0.1% |
| 4 | 23 | 0.1% |
| 5 | 188 | 1.2% |
| 6 | 1622 | 10.0% |
| 7 | 7176 | |
| 8 | 4737 | |
| 9 | 1822 | 11.2% |
| 10 | 557 | 3.4% |
| 11 | 97 | 0.6% |
| 12 | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 12 | 3 | < 0.1% |
| 11 | 97 | 0.6% |
| 10 | 557 | 3.4% |
| 9 | 1822 | 11.2% |
| 8 | 4737 | |
| 7 | 7176 | |
| 6 | 1622 | 10.0% |
| 5 | 188 | 1.2% |
| 4 | 23 | 0.1% |
| 3 | 3 | < 0.1% |
| Distinct | 242 |
|---|---|
| Distinct (%) | 1.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 258.5320434 |
| Minimum | 0 |
|---|---|
| Maximum | 2720 |
| Zeros | 10173 |
| Zeros (%) | 62.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 126.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 500 |
| 95-th percentile | 1080 |
| Maximum | 2720 |
| Range | 2720 |
| Interquartile range (IQR) | 500 |
Descriptive statistics
| Standard deviation | 399.2837658 |
|---|---|
| Coefficient of variation (CV) | 1.544426604 |
| Kurtosis | 1.321276483 |
| Mean | 258.5320434 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.445927685 |
| Sum | 4195458 |
| Variance | 159427.5256 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 10173 | |
| 600 | 178 | 1.1% |
| 500 | 172 | 1.1% |
| 700 | 154 | 0.9% |
| 800 | 148 | 0.9% |
| 400 | 138 | 0.9% |
| 300 | 106 | 0.7% |
| 900 | 105 | 0.6% |
| 1000 | 105 | 0.6% |
| 480 | 89 | 0.5% |
| Other values (232) | 4860 |
| Value | Count | Frequency (%) |
| 0 | 10173 | |
| 10 | 2 | < 0.1% |
| 20 | 1 | < 0.1% |
| 40 | 4 | < 0.1% |
| 50 | 7 | < 0.1% |
| 60 | 9 | 0.1% |
| 65 | 1 | < 0.1% |
| 70 | 5 | < 0.1% |
| 80 | 13 | 0.1% |
| 90 | 17 | 0.1% |
| Value | Count | Frequency (%) |
| 2720 | 1 | |
| 2600 | 1 | |
| 2300 | 1 | |
| 2250 | 1 | |
| 2170 | 1 | |
| 2160 | 1 | |
| 2150 | 1 | |
| 2110 | 1 | |
| 2090 | 1 | |
| 2080 | 1 |
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 919.3 KiB |
| 0 | |
|---|---|
| 2 | 621 |
| 3 | 263 |
| 1 | 231 |
| 4 | 102 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 16228 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 15011 | |
| 2 | 621 | 3.8% |
| 3 | 263 | 1.6% |
| 1 | 231 | 1.4% |
| 4 | 102 | 0.6% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 15011 | |
| 2 | 621 | 3.8% |
| 3 | 263 | 1.6% |
| 1 | 231 | 1.4% |
| 4 | 102 | 0.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 15011 | |
| 2 | 621 | 3.8% |
| 3 | 263 | 1.6% |
| 1 | 231 | 1.4% |
| 4 | 102 | 0.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 16228 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 15011 | |
| 2 | 621 | 3.8% |
| 3 | 263 | 1.6% |
| 1 | 231 | 1.4% |
| 4 | 102 | 0.6% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 16228 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 15011 | |
| 2 | 621 | 3.8% |
| 3 | 263 | 1.6% |
| 1 | 231 | 1.4% |
| 4 | 102 | 0.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 16228 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 15011 | |
| 2 | 621 | 3.8% |
| 3 | 263 | 1.6% |
| 1 | 231 | 1.4% |
| 4 | 102 | 0.6% |
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 951.0 KiB |
| 2.0 | |
|---|---|
| 1.0 | |
| 3.0 | |
| 4.0 | 86 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 48684 |
|---|---|
| Distinct characters | 6 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 2.0 |
| 4th row | 1.0 |
| 5th row | 2.0 |
Common Values
| Value | Count | Frequency (%) |
| 2.0 | 8162 | |
| 1.0 | 6715 | |
| 3.0 | 1265 | 7.8% |
| 4.0 | 86 | 0.5% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 2.0 | 8162 | |
| 1.0 | 6715 | |
| 3.0 | 1265 | 7.8% |
| 4.0 | 86 | 0.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| . | 16228 | |
| 0 | 16228 | |
| 2 | 8162 | |
| 1 | 6715 | |
| 3 | 1265 | 2.6% |
| 4 | 86 | 0.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 32456 | |
| Other Punctuation | 16228 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 16228 | |
| 2 | 8162 | |
| 1 | 6715 | |
| 3 | 1265 | 3.9% |
| 4 | 86 | 0.3% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 16228 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 48684 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| . | 16228 | |
| 0 | 16228 | |
| 2 | 8162 | |
| 1 | 6715 | |
| 3 | 1265 | 2.6% |
| 4 | 86 | 0.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 48684 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| . | 16228 | |
| 0 | 16228 | |
| 2 | 8162 | |
| 1 | 6715 | |
| 3 | 1265 | 2.6% |
| 4 | 86 | 0.2% |
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 951.0 KiB |
| 3.0 | |
|---|---|
| 4.0 | |
| 2.0 | |
| 5.0 | |
| 1.0 | 165 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 48684 |
|---|---|
| Distinct characters | 7 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 3.0 |
|---|---|
| 2nd row | 3.0 |
| 3rd row | 4.0 |
| 4th row | 5.0 |
| 5th row | 3.0 |
Common Values
| Value | Count | Frequency (%) |
| 3.0 | 7708 | |
| 4.0 | 5078 | |
| 2.0 | 2205 | 13.6% |
| 5.0 | 1072 | 6.6% |
| 1.0 | 165 | 1.0% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 3.0 | 7708 | |
| 4.0 | 5078 | |
| 2.0 | 2205 | 13.6% |
| 5.0 | 1072 | 6.6% |
| 1.0 | 165 | 1.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| . | 16228 | |
| 0 | 16228 | |
| 3 | 7708 | |
| 4 | 5078 | 10.4% |
| 2 | 2205 | 4.5% |
| 5 | 1072 | 2.2% |
| 1 | 165 | 0.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 32456 | |
| Other Punctuation | 16228 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 16228 | |
| 3 | 7708 | |
| 4 | 5078 | 15.6% |
| 2 | 2205 | 6.8% |
| 5 | 1072 | 3.3% |
| 1 | 165 | 0.5% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 16228 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 48684 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| . | 16228 | |
| 0 | 16228 | |
| 3 | 7708 | |
| 4 | 5078 | 10.4% |
| 2 | 2205 | 4.5% |
| 5 | 1072 | 2.2% |
| 1 | 165 | 0.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 48684 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| . | 16228 | |
| 0 | 16228 | |
| 3 | 7708 | |
| 4 | 5078 | 10.4% |
| 2 | 2205 | 4.5% |
| 5 | 1072 | 2.2% |
| 1 | 165 | 0.3% |
| Distinct | 747 |
|---|---|
| Distinct (%) | 4.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1696.06002 |
| Minimum | 380 |
|---|---|
| Maximum | 5710 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 126.9 KiB |
Quantile statistics
| Minimum | 380 |
|---|---|
| 5-th percentile | 840 |
| Q1 | 1170 |
| median | 1510 |
| Q3 | 2090 |
| 95-th percentile | 3130 |
| Maximum | 5710 |
| Range | 5330 |
| Interquartile range (IQR) | 920 |
Descriptive statistics
| Standard deviation | 714.9626592 |
|---|---|
| Coefficient of variation (CV) | 0.4215432537 |
| Kurtosis | 0.9715370441 |
| Mean | 1696.06002 |
| Median Absolute Deviation (MAD) | 410 |
| Skewness | 1.075612747 |
| Sum | 27523662 |
| Variance | 511171.6041 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1200 | 164 | 1.0% |
| 1300 | 159 | 1.0% |
| 1010 | 157 | 1.0% |
| 1400 | 152 | 0.9% |
| 1220 | 148 | 0.9% |
| 1340 | 148 | 0.9% |
| 1180 | 146 | 0.9% |
| 1140 | 144 | 0.9% |
| 1060 | 143 | 0.9% |
| 1100 | 138 | 0.9% |
| Other values (737) | 14729 |
| Value | Count | Frequency (%) |
| 380 | 1 | < 0.1% |
| 390 | 1 | < 0.1% |
| 420 | 2 | < 0.1% |
| 430 | 1 | < 0.1% |
| 440 | 1 | < 0.1% |
| 470 | 2 | < 0.1% |
| 480 | 4 | |
| 490 | 2 | < 0.1% |
| 500 | 2 | < 0.1% |
| 520 | 6 |
| Value | Count | Frequency (%) |
| 5710 | 1 | |
| 5480 | 1 | |
| 5450 | 1 | |
| 5320 | 1 | |
| 5250 | 1 | |
| 5190 | 1 | |
| 5070 | 1 | |
| 4930 | 1 | |
| 4850 | 1 | |
| 4750 | 2 |
| Distinct | 662 |
|---|---|
| Distinct (%) | 4.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1914.327089 |
| Minimum | 399 |
|---|---|
| Maximum | 5380 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 126.9 KiB |
Quantile statistics
| Minimum | 399 |
|---|---|
| 5-th percentile | 1120 |
| Q1 | 1470 |
| median | 1790 |
| Q3 | 2270 |
| 95-th percentile | 3080 |
| Maximum | 5380 |
| Range | 4981 |
| Interquartile range (IQR) | 800 |
Descriptive statistics
| Standard deviation | 608.6824722 |
|---|---|
| Coefficient of variation (CV) | 0.3179615833 |
| Kurtosis | 0.7684819671 |
| Mean | 1914.327089 |
| Median Absolute Deviation (MAD) | 380 |
| Skewness | 0.8938470952 |
| Sum | 31065700 |
| Variance | 370494.352 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1440 | 158 | 1.0% |
| 1560 | 155 | 1.0% |
| 1540 | 154 | 0.9% |
| 1500 | 148 | 0.9% |
| 1460 | 144 | 0.9% |
| 1580 | 140 | 0.9% |
| 1720 | 137 | 0.8% |
| 1480 | 133 | 0.8% |
| 1620 | 133 | 0.8% |
| 1520 | 133 | 0.8% |
| Other values (652) | 14793 |
| Value | Count | Frequency (%) |
| 399 | 1 | < 0.1% |
| 460 | 1 | < 0.1% |
| 620 | 2 | < 0.1% |
| 670 | 1 | < 0.1% |
| 690 | 2 | < 0.1% |
| 700 | 2 | < 0.1% |
| 710 | 1 | < 0.1% |
| 720 | 2 | < 0.1% |
| 740 | 5 | |
| 750 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 5380 | 1 | |
| 4950 | 1 | |
| 4920 | 1 | |
| 4640 | 1 | |
| 4600 | 1 | |
| 4590 | 1 | |
| 4530 | 1 | |
| 4510 | 1 | |
| 4495 | 1 | |
| 4490 | 1 |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 919.3 KiB |
| 0 | |
|---|---|
| 1 | 47 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 16228 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 16181 | |
| 1 | 47 | 0.3% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 16181 | |
| 1 | 47 | 0.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 16181 | |
| 1 | 47 | 0.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 16228 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 16181 | |
| 1 | 47 | 0.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 16228 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 16181 | |
| 1 | 47 | 0.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 16228 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 16181 | |
| 1 | 47 | 0.3% |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 951.0 KiB |
| 1.0 | |
|---|---|
| 2.0 | |
| 3.0 | 465 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 48684 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 2.0 |
| 4th row | 1.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 1.0 | 9789 | |
| 2.0 | 5974 | |
| 3.0 | 465 | 2.9% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 1.0 | 9789 | |
| 2.0 | 5974 | |
| 3.0 | 465 | 2.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| . | 16228 | |
| 0 | 16228 | |
| 1 | 9789 | |
| 2 | 5974 | 12.3% |
| 3 | 465 | 1.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 32456 | |
| Other Punctuation | 16228 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 16228 | |
| 1 | 9789 | |
| 2 | 5974 | 18.4% |
| 3 | 465 | 1.4% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 16228 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 48684 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| . | 16228 | |
| 0 | 16228 | |
| 1 | 9789 | |
| 2 | 5974 | 12.3% |
| 3 | 465 | 1.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 48684 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| . | 16228 | |
| 0 | 16228 | |
| 1 | 9789 | |
| 2 | 5974 | 12.3% |
| 3 | 465 | 1.0% |
| Distinct | 70 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 72.17130885 |
| Minimum | 0 |
|---|---|
| Maximum | 2015 |
| Zeros | 15641 |
| Zeros (%) | 96.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 126.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 2015 |
| Range | 2015 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 372.5691967 |
|---|---|
| Coefficient of variation (CV) | 5.162289595 |
| Kurtosis | 22.6983639 |
| Mean | 72.17130885 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.969257204 |
| Sum | 1171196 |
| Variance | 138807.8064 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 15641 | |
| 2014 | 68 | 0.4% |
| 2013 | 27 | 0.2% |
| 2000 | 24 | 0.1% |
| 2007 | 22 | 0.1% |
| 2005 | 19 | 0.1% |
| 2003 | 18 | 0.1% |
| 1990 | 17 | 0.1% |
| 2006 | 17 | 0.1% |
| 2009 | 16 | 0.1% |
| Other values (60) | 359 | 2.2% |
| Value | Count | Frequency (%) |
| 0 | 15641 | |
| 1934 | 1 | < 0.1% |
| 1940 | 2 | < 0.1% |
| 1944 | 1 | < 0.1% |
| 1945 | 2 | < 0.1% |
| 1946 | 2 | < 0.1% |
| 1948 | 1 | < 0.1% |
| 1950 | 2 | < 0.1% |
| 1951 | 1 | < 0.1% |
| 1953 | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 2015 | 12 | 0.1% |
| 2014 | 68 | |
| 2013 | 27 | 0.2% |
| 2012 | 9 | 0.1% |
| 2011 | 8 | < 0.1% |
| 2010 | 8 | < 0.1% |
| 2009 | 16 | 0.1% |
| 2008 | 10 | 0.1% |
| 2007 | 22 | 0.1% |
| 2006 | 17 | 0.1% |
| Distinct | 6401 |
|---|---|
| Distinct (%) | 39.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7151.695075 |
| Minimum | 520 |
|---|---|
| Maximum | 17622 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 126.9 KiB |
Quantile statistics
| Minimum | 520 |
|---|---|
| 5-th percentile | 1688.35 |
| Q1 | 5000 |
| median | 7151.695075 |
| Q3 | 8800 |
| 95-th percentile | 13178.4 |
| Maximum | 17622 |
| Range | 17102 |
| Interquartile range (IQR) | 3800 |
Descriptive statistics
| Standard deviation | 3195.496209 |
|---|---|
| Coefficient of variation (CV) | 0.4468166184 |
| Kurtosis | 0.4870625894 |
| Mean | 7151.695075 |
| Median Absolute Deviation (MAD) | 1993 |
| Skewness | 0.4849176236 |
| Sum | 116057707.7 |
| Variance | 10211196.02 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 7151.695075 | 1772 | 10.9% |
| 5000 | 270 | 1.7% |
| 6000 | 214 | 1.3% |
| 4000 | 191 | 1.2% |
| 7200 | 162 | 1.0% |
| 4800 | 93 | 0.6% |
| 9600 | 90 | 0.6% |
| 4500 | 90 | 0.6% |
| 7500 | 89 | 0.5% |
| 8400 | 83 | 0.5% |
| Other values (6391) | 13174 |
| Value | Count | Frequency (%) |
| 520 | 1 | |
| 600 | 1 | |
| 635 | 1 | |
| 638 | 1 | |
| 649 | 2 | |
| 651 | 1 | |
| 676 | 1 | |
| 681 | 1 | |
| 683 | 1 | |
| 690 | 2 |
| Value | Count | Frequency (%) |
| 17622 | 1 | < 0.1% |
| 17600 | 3 | |
| 17585 | 1 | < 0.1% |
| 17583 | 1 | < 0.1% |
| 17577 | 1 | < 0.1% |
| 17550 | 1 | < 0.1% |
| 17541 | 1 | < 0.1% |
| 17532 | 1 | < 0.1% |
| 17500 | 1 | < 0.1% |
| 17487 | 1 | < 0.1% |
| Distinct | 3033 |
|---|---|
| Distinct (%) | 18.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 471191.77 |
| Minimum | 75000 |
|---|---|
| Maximum | 1130000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 126.9 KiB |
Quantile statistics
| Minimum | 75000 |
|---|---|
| 5-th percentile | 209640 |
| Q1 | 314000 |
| median | 432500 |
| Q3 | 597500 |
| 95-th percentile | 865000 |
| Maximum | 1130000 |
| Range | 1055000 |
| Interquartile range (IQR) | 283500 |
Descriptive statistics
| Standard deviation | 202745.1953 |
|---|---|
| Coefficient of variation (CV) | 0.4302816989 |
| Kurtosis | -0.06389500165 |
| Mean | 471191.77 |
| Median Absolute Deviation (MAD) | 133500 |
| Skewness | 0.7198897063 |
| Sum | 7646500043 |
| Variance | 4.110561421 × 1010 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 350000 | 141 | 0.9% |
| 450000 | 135 | 0.8% |
| 425000 | 131 | 0.8% |
| 500000 | 128 | 0.8% |
| 550000 | 128 | 0.8% |
| 325000 | 116 | 0.7% |
| 375000 | 116 | 0.7% |
| 400000 | 110 | 0.7% |
| 250000 | 106 | 0.7% |
| 300000 | 105 | 0.6% |
| Other values (3023) | 15012 |
| Value | Count | Frequency (%) |
| 75000 | 1 | |
| 78000 | 1 | |
| 80000 | 1 | |
| 81000 | 1 | |
| 82000 | 1 | |
| 82500 | 1 | |
| 83000 | 1 | |
| 84000 | 1 | |
| 85000 | 2 | |
| 89000 | 1 |
| Value | Count | Frequency (%) |
| 1130000 | 4 | < 0.1% |
| 1122500 | 1 | < 0.1% |
| 1120280 | 1 | < 0.1% |
| 1120000 | 4 | < 0.1% |
| 1115500 | 1 | < 0.1% |
| 1112750 | 1 | < 0.1% |
| 1110000 | 6 | < 0.1% |
| 1103990 | 1 | < 0.1% |
| 1102030 | 1 | < 0.1% |
| 1100000 | 24 |
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 919.3 KiB |
| 3 | |
|---|---|
| 4 | |
| 5 | |
| 2 | 134 |
| 1 | 24 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 16228 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 3 |
|---|---|
| 2nd row | 3 |
| 3rd row | 3 |
| 4th row | 3 |
| 5th row | 3 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 10574 | |
| 4 | 4279 | |
| 5 | 1217 | 7.5% |
| 2 | 134 | 0.8% |
| 1 | 24 | 0.1% |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 3 | 10574 | |
| 4 | 4279 | |
| 5 | 1217 | 7.5% |
| 2 | 134 | 0.8% |
| 1 | 24 | 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 10574 | |
| 4 | 4279 | |
| 5 | 1217 | 7.5% |
| 2 | 134 | 0.8% |
| 1 | 24 | 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 16228 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 10574 | |
| 4 | 4279 | |
| 5 | 1217 | 7.5% |
| 2 | 134 | 0.8% |
| 1 | 24 | 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 16228 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 10574 | |
| 4 | 4279 | |
| 5 | 1217 | 7.5% |
| 2 | 134 | 0.8% |
| 1 | 24 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 16228 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 10574 | |
| 4 | 4279 | |
| 5 | 1217 | 7.5% |
| 2 | 134 | 0.8% |
| 1 | 24 | 0.1% |
| Distinct | 6011 |
|---|---|
| Distinct (%) | 37.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7286.342928 |
| Minimum | 659 |
|---|---|
| Maximum | 20000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 126.9 KiB |
Quantile statistics
| Minimum | 659 |
|---|---|
| 5-th percentile | 1916 |
| Q1 | 5039.5 |
| median | 7286.342928 |
| Q3 | 8869 |
| 95-th percentile | 13129 |
| Maximum | 20000 |
| Range | 19341 |
| Interquartile range (IQR) | 3829.5 |
Descriptive statistics
| Standard deviation | 3231.976368 |
|---|---|
| Coefficient of variation (CV) | 0.4435663267 |
| Kurtosis | 1.3278936 |
| Mean | 7286.342928 |
| Median Absolute Deviation (MAD) | 2013.657072 |
| Skewness | 0.7161463828 |
| Sum | 118242773 |
| Variance | 10445671.24 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 7286.342928 | 1289 | 7.9% |
| 5000 | 325 | 2.0% |
| 4000 | 275 | 1.7% |
| 6000 | 222 | 1.4% |
| 7200 | 161 | 1.0% |
| 7500 | 108 | 0.7% |
| 4800 | 103 | 0.6% |
| 4500 | 92 | 0.6% |
| 8400 | 85 | 0.5% |
| 3600 | 84 | 0.5% |
| Other values (6001) | 13484 |
| Value | Count | Frequency (%) |
| 659 | 1 | < 0.1% |
| 660 | 1 | < 0.1% |
| 748 | 1 | < 0.1% |
| 750 | 3 | |
| 755 | 1 | < 0.1% |
| 758 | 1 | < 0.1% |
| 794 | 1 | < 0.1% |
| 810 | 2 | |
| 886 | 3 | |
| 887 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 20000 | 11 | |
| 19998 | 2 | < 0.1% |
| 19965 | 1 | < 0.1% |
| 19961 | 1 | < 0.1% |
| 19939 | 1 | < 0.1% |
| 19916 | 1 | < 0.1% |
| 19908 | 1 | < 0.1% |
| 19878 | 1 | < 0.1% |
| 19868 | 2 | < 0.1% |
| 19856 | 1 | < 0.1% |
| Distinct | 797 |
|---|---|
| Distinct (%) | 4.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1954.592063 |
| Minimum | 380 |
|---|---|
| Maximum | 7350 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 126.9 KiB |
Quantile statistics
| Minimum | 380 |
|---|---|
| 5-th percentile | 920 |
| Q1 | 1390 |
| median | 1840 |
| Q3 | 2410 |
| 95-th percentile | 3350 |
| Maximum | 7350 |
| Range | 6970 |
| Interquartile range (IQR) | 1020 |
Descriptive statistics
| Standard deviation | 756.0729845 |
|---|---|
| Coefficient of variation (CV) | 0.3868188144 |
| Kurtosis | 0.9271512852 |
| Mean | 1954.592063 |
| Median Absolute Deviation (MAD) | 500 |
| Skewness | 0.809911147 |
| Sum | 31719120 |
| Variance | 571646.3579 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1440 | 112 | 0.7% |
| 1400 | 110 | 0.7% |
| 1300 | 107 | 0.7% |
| 1480 | 103 | 0.6% |
| 1540 | 103 | 0.6% |
| 1010 | 100 | 0.6% |
| 1660 | 100 | 0.6% |
| 1720 | 100 | 0.6% |
| 1820 | 99 | 0.6% |
| 1560 | 99 | 0.6% |
| Other values (787) | 15195 |
| Value | Count | Frequency (%) |
| 380 | 1 | < 0.1% |
| 390 | 1 | < 0.1% |
| 420 | 2 | < 0.1% |
| 430 | 1 | < 0.1% |
| 440 | 1 | < 0.1% |
| 470 | 2 | < 0.1% |
| 480 | 2 | < 0.1% |
| 490 | 1 | < 0.1% |
| 500 | 1 | < 0.1% |
| 520 | 6 |
| Value | Count | Frequency (%) |
| 7350 | 1 | |
| 7120 | 1 | |
| 6050 | 1 | |
| 5820 | 1 | |
| 5774 | 1 | |
| 5710 | 1 | |
| 5660 | 1 | |
| 5635 | 1 | |
| 5610 | 1 | |
| 5470 | 1 |
yr_date
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 998.5 KiB |
| 2014.0 | |
|---|---|
| 2015.0 |
Length
| Max length | 6 |
|---|---|
| Median length | 6 |
| Mean length | 6 |
| Min length | 6 |
Characters and Unicode
| Total characters | 97368 |
|---|---|
| Distinct characters | 6 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2015.0 |
|---|---|
| 2nd row | 2015.0 |
| 3rd row | 2014.0 |
| 4th row | 2015.0 |
| 5th row | 2015.0 |
Common Values
| Value | Count | Frequency (%) |
| 2014.0 | 11023 | |
| 2015.0 | 5205 |
Length
Category Frequency Plot
| Value | Count | Frequency (%) |
| 2014.0 | 11023 | |
| 2015.0 | 5205 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 32456 | |
| 2 | 16228 | |
| 1 | 16228 | |
| . | 16228 | |
| 4 | 11023 | 11.3% |
| 5 | 5205 | 5.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 81140 | |
| Other Punctuation | 16228 | 16.7% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 32456 | |
| 2 | 16228 | |
| 1 | 16228 | |
| 4 | 11023 | 13.6% |
| 5 | 5205 | 6.4% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 16228 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 97368 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 32456 | |
| 2 | 16228 | |
| 1 | 16228 | |
| . | 16228 | |
| 4 | 11023 | 11.3% |
| 5 | 5205 | 5.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 97368 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 32456 | |
| 2 | 16228 | |
| 1 | 16228 | |
| . | 16228 | |
| 4 | 11023 | 11.3% |
| 5 | 5205 | 5.3% |
| Distinct | 117 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 43.41767316 |
| Minimum | -1 |
|---|---|
| Maximum | 115 |
| Zeros | 302 |
| Zeros (%) | 1.9% |
| Negative | 12 |
| Negative (%) | 0.1% |
| Memory size | 126.9 KiB |
Quantile statistics
| Minimum | -1 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 18 |
| median | 40 |
| Q3 | 63 |
| 95-th percentile | 99 |
| Maximum | 115 |
| Range | 116 |
| Interquartile range (IQR) | 45 |
Descriptive statistics
| Standard deviation | 29.10348042 |
|---|---|
| Coefficient of variation (CV) | 0.670314144 |
| Kurtosis | -0.643806599 |
| Mean | 43.41767316 |
| Median Absolute Deviation (MAD) | 22 |
| Skewness | 0.4593440295 |
| Sum | 704582 |
| Variance | 847.0125723 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 11 | 343 | 2.1% |
| 9 | 343 | 2.1% |
| 10 | 326 | 2.0% |
| 8 | 325 | 2.0% |
| 37 | 310 | 1.9% |
| 0 | 302 | 1.9% |
| 36 | 295 | 1.8% |
| 7 | 290 | 1.8% |
| 46 | 268 | 1.7% |
| 47 | 268 | 1.7% |
| Other values (107) | 13158 |
| Value | Count | Frequency (%) |
| -1 | 12 | 0.1% |
| 0 | 302 | |
| 1 | 203 | |
| 2 | 127 | 0.8% |
| 3 | 117 | 0.7% |
| 4 | 100 | 0.6% |
| 5 | 148 | |
| 6 | 237 | |
| 7 | 290 | |
| 8 | 325 |
| Value | Count | Frequency (%) |
| 115 | 21 | 0.1% |
| 114 | 47 | |
| 113 | 20 | 0.1% |
| 112 | 25 | 0.2% |
| 111 | 38 | |
| 110 | 39 | |
| 109 | 43 | |
| 108 | 61 | |
| 107 | 65 | |
| 106 | 52 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| df_index | zipcode | grade | sqft_basement | view | bathrooms | bedrooms | sqft_above | sqft_living15 | waterfront | floors | yr_renovated | sqft_lot | price | condition | sqft_lot15 | sqft_living | yr_date | antiguedad_venta | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 19857 | 98006 | 10 | 0.0 | 0 | 2.0 | 3.0 | 2610.0 | 3140.0 | 0 | 2.0 | 0.0 | 8481.000000 | 810000.0 | 3 | 10008.0 | 2610.0 | 2015.0 | 22.0 |
| 1 | 14014 | 98033 | 8 | 650.0 | 1 | 1.0 | 3.0 | 1560.0 | 2210.0 | 0 | 1.0 | 0.0 | 8955.000000 | 685000.0 | 3 | 8976.0 | 2210.0 | 2015.0 | 41.0 |
| 2 | 32909 | 98005 | 8 | 0.0 | 0 | 2.0 | 4.0 | 2650.0 | 2230.0 | 0 | 2.0 | 0.0 | 7151.695075 | 725000.0 | 3 | 19856.0 | 2650.0 | 2014.0 | 28.0 |
| 3 | 16305 | 98001 | 7 | 900.0 | 0 | 1.0 | 5.0 | 1050.0 | 1660.0 | 0 | 1.0 | 0.0 | 8720.000000 | 274000.0 | 3 | 8030.0 | 1950.0 | 2015.0 | 53.0 |
| 4 | 6647 | 98011 | 7 | 320.0 | 0 | 2.0 | 3.0 | 1310.0 | 1620.0 | 0 | 1.0 | 0.0 | 6449.000000 | 445000.0 | 3 | 7429.0 | 1630.0 | 2015.0 | 29.0 |
| 5 | 5865 | 98040 | 8 | 850.0 | 0 | 2.0 | 4.0 | 1760.0 | 2550.0 | 0 | 1.0 | 0.0 | 8760.000000 | 762500.0 | 4 | 10376.0 | 2610.0 | 2014.0 | 36.0 |
| 6 | 8009 | 98004 | 8 | 0.0 | 1 | 1.0 | 3.0 | 1700.0 | 2630.0 | 0 | 1.0 | 0.0 | 14133.000000 | 979000.0 | 4 | 17376.0 | 1700.0 | 2014.0 | 60.0 |
| 7 | 4731 | 98011 | 8 | 780.0 | 0 | 3.0 | 5.0 | 2090.0 | 2640.0 | 0 | 2.0 | 0.0 | 4369.000000 | 540000.0 | 3 | 4610.0 | 2870.0 | 2014.0 | 7.0 |
| 8 | 38480 | 98052 | 9 | 0.0 | 0 | 2.0 | 4.0 | 2700.0 | 2730.0 | 0 | 2.0 | 0.0 | 8810.000000 | 690000.0 | 3 | 5100.0 | 2700.0 | 2014.0 | 10.0 |
| 9 | 13246 | 98072 | 7 | 530.0 | 0 | 1.0 | 3.0 | 1130.0 | 1260.0 | 0 | 1.0 | 0.0 | 9673.000000 | 375000.0 | 3 | 9681.0 | 1660.0 | 2014.0 | 38.0 |
Last rows
| df_index | zipcode | grade | sqft_basement | view | bathrooms | bedrooms | sqft_above | sqft_living15 | waterfront | floors | yr_renovated | sqft_lot | price | condition | sqft_lot15 | sqft_living | yr_date | antiguedad_venta | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 16218 | 9302 | 98148 | 7 | 0.0 | 0 | 1.0 | 2.0 | 940.0 | 1890.0 | 0 | 1.0 | 0.0 | 6000.0 | 246500.0 | 2 | 8547.0 | 940.0 | 2015.0 | 61.0 |
| 16219 | 26872 | 98008 | 6 | 0.0 | 0 | 1.0 | 3.0 | 1270.0 | 1210.0 | 0 | 1.0 | 0.0 | 8000.0 | 475000.0 | 4 | 7875.0 | 1270.0 | 2014.0 | 55.0 |
| 16220 | 49558 | 98198 | 6 | 0.0 | 2 | 1.0 | 2.0 | 1170.0 | 1380.0 | 0 | 1.0 | 0.0 | 8925.0 | 175000.0 | 3 | 7440.0 | 1170.0 | 2014.0 | 103.0 |
| 16221 | 146 | 98117 | 6 | 120.0 | 0 | 1.0 | 2.0 | 860.0 | 980.0 | 0 | 1.0 | 0.0 | 2130.0 | 400000.0 | 4 | 2800.0 | 980.0 | 2014.0 | 96.0 |
| 16222 | 9396 | 98065 | 7 | 0.0 | 0 | 2.0 | 3.0 | 1950.0 | 2190.0 | 0 | 2.0 | 0.0 | 7263.0 | 409000.0 | 3 | 5900.0 | 1950.0 | 2014.0 | 7.0 |
| 16223 | 14466 | 98198 | 7 | 0.0 | 0 | 2.0 | 4.0 | 1780.0 | 1630.0 | 0 | 2.0 | 0.0 | 6000.0 | 175000.0 | 3 | 6000.0 | 1780.0 | 2014.0 | 23.0 |
| 16224 | 30056 | 98042 | 6 | 0.0 | 0 | 1.0 | 3.0 | 840.0 | 920.0 | 0 | 1.0 | 0.0 | 5525.0 | 191000.0 | 5 | 5330.0 | 840.0 | 2015.0 | 46.0 |
| 16225 | 5824 | 98106 | 7 | 550.0 | 0 | 2.0 | 3.0 | 1230.0 | 1780.0 | 0 | 1.0 | 0.0 | 6771.0 | 310000.0 | 3 | 6771.0 | 1780.0 | 2014.0 | 24.0 |
| 16226 | 16712 | 98038 | 7 | 0.0 | 0 | 2.0 | 3.0 | 1340.0 | 1060.0 | 0 | 2.0 | 0.0 | 3011.0 | 230000.0 | 3 | 3232.0 | 1340.0 | 2014.0 | 19.0 |
| 16227 | 237 | 98075 | 10 | 0.0 | 0 | 2.0 | 3.0 | 3240.0 | 2970.0 | 0 | 2.0 | 0.0 | 7857.0 | 800000.0 | 3 | 7857.0 | 3240.0 | 2014.0 | 20.0 |